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ABSTRACT 

Motivation: Bioiogistics provides data for quantitative analysis of 
transport (diffusion) processes and their spatio-temporal correlations 
in cells. Mobility of proteins is one of the few parameters necessary to 
describe reaction rates for gene regulation. Although understanding of 
diffusion-limited biochemical reactions in vivo requires mobility data 
for the largest possible number of proteins in their native forms, cur- 
rently, there is no database that would contain the complete informa- 
tion about the diffusion coefficients (DCs) of proteins in a given cell 
type. 

Results: We demonstrate a method for the determination of in vivo 
DCs for any molecule— regardless of its molecular weight, size and 
structure— in any type of cell. We exemplify the method with the data- 
base of in vivo DC for all proteins (4302 records) from the proteome of 
K12 strain of Escherichia coii, together with examples of DC of amino 
acids, sugars, RNA and DNA. The database follows from the 
scale-dependent viscosity reference curve (sdVRC). Construction of 
sdVRC for prokaryotic or eukaryotic cell requires ~20 in vivo meas- 
urements using techniques such as fluorescence correlation spectro- 
scopy (FCS), fluorescence recovery after photobleaching (FRAP), 
nuclear magnetic resonance (NMR) or particle tracking. The shape 
of the sdVRC would be different for each organism, but the mathem- 
atical form of the curve remains the same. The presented method has 
a high predictive power, as the measurements of DCs of several inert, 
properly chosen probes in a single cell type allows to determine the 
DCs of thousands of proteins. Additionally, obtained mobility data 
allow quantitative study of biochemical interactions in vivo. 
Contact: rholyst@ichf.edu.pl 

Supplementary information: Supplementary data are available at 
Bioinformatics Online. 
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1 INTRODUCTION 

Biologistics and biochemistry in a crowded environment are two 
emerging interdisciplinary fields of science. They provide quanti- 
tative analysis of transport of proteins and their spatio-temporal 
correlations involved in gene expression and regulation. 
According to the current state-of-the-art theory of gene expres- 
sion (activation or repression) in bacteria (Elf et al., 2007; Li 
et al., 2009), mobility of proteins is one of the few parameters 
necessary to describe reaction rates of gene regulation. The 
mobility is understood as a three-dimensional diffusion or 
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one-dimensional sliding along DNA (for prokaryotes and eu- 
karyotes), or by velocity of molecular motors (in eukaryotic 
cells). Understanding of diffusion-limited biochemical reactions 
requires accurate in vivo mobiUty data for the largest possible 
number of proteins in their native forms. The three-dimensional 
diffusion of different types of macromolecules in the cytoplasm 
of Escherichia coli has been experimentally studied in several 
cases (Bakshi et al, 2012; Campbell and MuUins, 2007; 
Cluzel et al., 2000; Derman et ai, 2008; Elowitz et al., 1999; 
English et al., 2011; Golding and Cox, 2004; Jasnin et al., 2008; 
Konopka et al, 2006; Kumar et al., 2010; Mika et al., 2010; 
Mullineaux et al., 2006; Nenninger et al., 2010; Slade et al., 
2009; van den Bogaart et al., 2007), but experiinental determin- 
ation of the mobility of all proteins is technically an impossible 
task because of their large number in a given cell. For example, 
the proteome of the K12 strain of E. coli (Blattner et al., 1997) 
contains more than 4300 proteins. Moreover, most of the recent 
studies concern measurements mainly performed with the use of 
green fluorescent protein (GFP) (Elowitz et al., 1999; Konopka 
et al., 2006; Kumar et al, 2010; Mika et al., 2010; Nenninger 
et al, 2010; Slade et al, 2009; van den Bogaart et al., 2007) or 
GFP fusion proteins (Jennifer et al., 2001). 

Attempts to study the diffusion of many proteins simultan- 
eously, under conditions resembling the interior of the cells, 
were performed in silico by McGuffee and Elcock (2010). 
Computational methods, however, have limitations arising 
from the speed and capacity of computing hardware and small 
number of interacting proteins in the system (~50 different types 
of proteins) (McGuffee and Elcock, 2010). An alternative ap- 
proach is the quantitative analysis of available literature data. 
Mika and Poolman (2011) gathered literature data of diffusion 
coefficients (DCs) of ~20 different types of proteins in E. coli 
and proposed a power law dependence of the DC on the mo- 
lecular weight of proteins. This power law, however (Mika and 
Poolman, 201 1), can be applied only for the proteins in a narrow 
range of molecular weights, i.e. between 20 and 30kDa. 

In this work, we present a method for predictions of the DCs 
of proteins for the proteome of any cell. We collected all 
available literature data (Bakshi et al, 2012; Campbell and 
Mullins, 2007; Cluzel et al., 2000; Derman et al, 2008; Elowitz 
et al., 1999; English et al, 2011; Golding and Cox, 2004; Jasnin 
et al, 2008; Konopka et al, 2006; Kumar et al., 2010; 
Mika et al., 2010; MuUineaux et al., 2006; Nenninger et al., 
2010; Slade et al, 2009; van den Bogaart et al., 2007) on diffusion 
of various probes, including small molecules (water, glucose), 
proteins and plasmids, in the cytoplasm of E. coli. We used 
those data and the scaling function of viscosity (Holyst et al.. 
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2009; Kalwarczyk et al., 201 1; Szymanski et al., 2006a, b) to pre- 
dict the mobility of macromolecules in the bacterial cytoplasm. 
We also predicted the DCs of amino acids, sugars, proteins and 
DNA. We created a unique database, including the DCs of all 
proteins of strain K12 of E. coli (4302 proteins), their oligomers 
and their potential complexes with translocation proteins; 6600 
records in total. 



2 METHODS 

2.1 A brief description of the method 

Our predictions of DCs of proteins in the bacterial cytoplasm are based 
on experimental data on diffusion in the cytoplasm of E. coli available in 
the literature (Bakshi et at., 2012; Campbell and Mullins, 2007; Cluzel 
el al., 2000; Derman et al., 2008; Elowitz et al., 1999; English et al., 2011; 
Golding and Cox, 2004; Jasnin et al., 2008; Konopka et al., 2006; Kumar 
el al., 2010; Mika et al., 2010; Mullineaux et al., 2006; Nenninger et al., 
2010; Slade et al., 2009; van den Bogaart et al., 2007). The method relies 
on the dependence fo/fcyio = l/Vo, where Do is the DC of niacromol- 
ecule in water of viscosity rio, and £>cyto is the DC of macromolecule in the 
cytoplasm. is the effective viscosity experienced by the macromolecule 
during diffusion in the cytoplasm. The protocol of determination of DCs 
is graphically represented in Figure 1. 

2.2 Calculation of hydrodynamic radii and DCs in water 

Hydrodynamic radius of proteins was determined using the following 
formula (Dill et al., 2011): 

rp = O.OSISM^^**^ [nm]. 



while for RNA we used Equation (2) (Werner, 2011). 

;-p = 0.0566M^''* [nm]. 
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Fig. 1. Diagram of a method of predicting the DC of any molecule in 
the cell cytoplasm. To predict the DCs of molecules in the cytoplasm, 
it is essential to correctly select the probes that will be used to determine 
the reference curve. Next, one need to measure the DCs of selected probes 
in water (buffer) Do and the DC in the cytoplasm of studied cell fcyto- 
Using Do and Z)cyto, we create the sdVRC. To predict the DC of a given 
molecule, it is necessary to know its hydrodynamic radius i-p or Do- 
Although sdVRC depends on both i-p and Do. in practice, both param- 
eters can be calculated knowing only one of them. Finally, by substituting 
the values of Cp and Do to sdVRC, the DC in the cytoplasm £>cyto can be 
determined 



Dependence of the hydrodynamic radii of linear, circular or super 
coiled DNA on molecular weight [Equations (3)-(5), respectively] was 
obtained from DCs of DNA constructs (Robertson et al., 2006) using 
Equation (6). 



: 0.024M^" [n 



: 0.0125M"; 



: 0.0I45M^" [nm]. 
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Radii of amino acids and sugars have been calculated, assuming that the 
hydrodynainic radius Cp corresponds to the van der Waals radius r„ 
calculated according to the procedure described elsewhere (Zhao et al., 
2003). 

For each probe, we use the literature values of Dcyto, while the values 
of Do (if not available) were calculated using the Stokes-Sutherland- 
Einstein equation [Equation (6)]. 



Do = 



kT 



(6) 



2.3 Calculation of DCs of various molecules in the 
cytoplasm of E. coli 

Using the molecular weights from Uniprot protein database (Apweiler 
et al., 2011; Jain et al., 2009), we calculated the DCs for the coinplete 
proteome of £. coli (K12 strain). We identified the cellular localization of 
each protein as well as its quaternary structure (a single polypeptide chain 
or multiple chain aggregates or complexes). In the case of membrane or 
periplasmic proteins, we adopted the assumption that, after synthesis, the 
proteins diffuse via the cytoplasm to its target in the membrane, through 
one of two transport pathways [twin-arginine translocation (TAT) or the 
general secretion system (Sec)] (Driessen and Nouwen, 2008; Sargent, 
2007). Consequently, these proteins were considered as single polypeptide 
chains (the TAT pathway) or protein complexes with SecB or Tig pro- 
teins (the Sec pathway). Hydrodynamic radius of proteins was deter- 
mined using Equation (I). When the protein was composed of several 
subunits, the molecular weight of all polypeptide chains comprising the 
protein was added together. On the basis of cumulative molecular weight 
of the coiTiplex, hydrodynamic radius of the protein Cp and further its DC 
Do was calculated [Equations (1) and (6)]. Then, using Equation (7), we 
calculated the relative DCs for all analysed proteins, and we calculated 
the DCs of proteins in the cytoplasm fcyto ■ The calculated DCs of all 
proteins in the cytoplasm are summarized in Supplementary Table SI. 



3 RESULTS AND DISCUSSION 

3.1 Construction of the scale-dependent viscosity reference 
curve 

We collected the literature data (Bakshi et al., 2012; Campbell 
and Mullins, 2007; Cluzel et al., 2000; Elowitz et al., 1999; 
English et al., 2011; Golding and Cox, 2004; Jasnin et al., 
2008; Konopka et al., 2006; Kumar et al., 2010; Mika et al., 
2010; Mullineaux et al., 2006; Neiminger et al., 2010; Slade 
et al., 2009; van den Bogaart et al., 2007) for DCs of different 
solutes and macromolecules in the cytoplasm of E. coli (Fig. 2 
and Table 1). We used the least squares method to fit those data 
with Equation (7) (Kalwarczyk et al., 2011). 
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Fig. 2. The sdVRC. The logarithm of viscosity r; divided by the viscosity 
of water tjo [ln(!)/;)o) = o/f cyto)] as a function of the hydrodynamic 
radius i-p of various probes (Table 1) of radii from O.I6nm to 203 nm 
(closed square). The cytoplasmic DCs fcyio of probes were taken from 
the literature (Bakshi el al, 2012; Campbell and Mullins, 2007; Cluzel 
et al., 2000; Elowitz el al., 1999; English el al.. 2011; Golding and Cox, 
2004; Jasnin et al., 2008; Konopka el al., 2006; Kumar et al., 2010; Mika 
et al., 2010; Mullineaux et al., 2006; Nenninger et al., 2010; Slade et al., 
2009; van den Bogaart et al., 2007) (cf Table 1). By fitting the data with 
Equation (7) (solid line), we determined two length scales: 
5 = 0.51 ± 0.09 nm and i?h = 42 ± 9 nm. We also determined the macro- 
scopic viscosity of the cytoplasm = 17.5Pa ■ s, i.e. 26000 times higher 
than the viscosity of water rio at 310 K. Shading represents the maximum 
error of fitting 



here rp is the hydrodynamic radius of the probe, and i?h and | 
are length scales characterizing the cytoplasm. | (an average dis- 
tance between surfaces of proteins), (average hydrodynamic 
radius of the biggest crowders) and a (a constant of the order of 
one) are the fitting parameters whose values for the cytoplasm of 
E. coli are as follows: f = 0.51 it 0.09 nm, i^h = 42 ± 9nm and 
a — 0.53 ± 0.04. From the scale-dependent viscosity reference 
curve (sdVRC), we directly determined the macroscopic viscosity 
rim of the cytoplasm. We found that ^7m = 17.5 Pa -s (26000 
times greater than the viscosity of water - rjo ^ 0.7 mPa • s at 
310 K). Rh is comparable to the radius of the loops (Kim et al, 
2004) of DNA covered with proteins. The second length scale 
determined from sdVRC, is comparable to the average dis- 
tance between surfaces of proteins. determines the length 
scale above which the viscosity ceases to depend on the size of 
the probe and reaches the macroscopic value. For a probe smal- 
ler than the experienced viscosity has a value comparable to 
the viscosity of water. 

We used as-obtained sdVRC [Equation (7)] as a tool for 
prediction of DCs of all known proteins of K12 strain 
(Blattner et a!., 1997) of E. coli as well as other molecules and 
macromolecules. 

3.2 Interpretation of sdVRC 

For more than a decade, diffusion of various proteins in the 
cytoplasm of E. coli has been studied (Table 1) (Bakshi et al., 
2012; Campbell and Mullins, 2007; Cluzel el al., 2000; 
Elowitz et al., 1999; Enghsh et al., 2011; Golding and Cox, 



2004; Jasnin et al., 2008; Konopka et al., 2006; Kumar et al., 
2010; Mika et al., 2010; Mullineaux et al., 2006; Nenninger et al., 
2010; Slade et al., 2009; van den Bogaart et al., 2007). Those 
experimental data show that the DCs exponentially depend on 
the size of the diffusing molecule. For example, GFP with a 
molecular weight M„ = 27 kDa and hydrodynamic radius 
jp = 2.8 nm is characterized by cytoplasmic DC (Elowitz et al., 
1999) Dcyto = 7.7 ± 2.5 iim^/s. On the other hand, the DC of 
large oligomeric protein consisting of four subunits of 
GFP-tagged /i-galactosidase (/i-gal-GFP)4, of radius almost 



three times greater than GFP (M„ ^ 580 kDa, 



7.3 nm), is 



equal to 0.7 ± 0.22fim^/s (Mika et al., 2010). The above differ- 
ences are explained in terms of scale-dependent viscosity 
(Kalwarczyk et al., 2011) experienced by the diffusing molecule 
[cf. sdVRC, Equation (7)]. Equation (7) is an empiiical equation 
primarily found for synthetic systems such as polymer or micellar 
solutions (Holyst et al., 2009; Kalwarczyk et al., 2011; Szymahski 
et al., 2006a, b). Interpretation of four parameters in Equation 
(7) (Rh, f , ijm and ijo) is taken from those studies (Holyst et al., 
2009; Kalwarczyk et al., 2011; Szymahski et al., 2006a, b). In 
synthetic systems, f is the average distance between macromol- 
ecular components of the complex liquid and R\^ is equal to the 
hydrodynainic radius of a polymer randoin coil or of a inicelle. 
In sdVRC, both f and R^ determine the viscosity experienced by 
a probe diffusing in the investigated liquid. For i-p ^ R/,, the 
probe experiences the macroscopic viscosity rjm- A probe of 
radius Cp smaller than f moving in the liquid experiences the 
viscosity of the solvent i^o. On the other hand, a probe of 
I-p > f will experience a viscosity higher than the viscosity of the 
solvent. Finally, the effective viscosity experienced by a probe 
of radius between f and i^i, (f <;-p<i^h) depends exponentially 
on I-p. In case of the cytoplasm of mammalian cells, Rh corres- 
ponds to the hydrodynainic radius of the filainents forming the 
cellular cytoskeleton in the volume of the cytoplasm 
(Kalwarczyk et al., 2011). The bacterial cytoskeleton (Shih and 
Rothfield, 2006), however, is located directly next to the inner 
membrane (Pogliano, 2008). We can therefore assume that it 
should not have a large contribution to the viscosity experienced 
by the proteins diffusing across the cytoplasm. This assumption 
is also supported by the value of i^h = 42 ± 9 nm determined 
from fitting, which is similar to the radius of the objects identi- 
fied as fragments of the bacterial nucleoid (around 40 nm) (Kim 
et al., 2004), i.e. loops of DNA covered with structural proteins. 
This value can be compared with the value of the hydrodynamic 
radius of the filaments forming the bacterial cytoskeleton (Hou 
et al., 2012; Pogliano, 2008) (fragments of length L= 100 nm and 
a radius r = 2.5nm), which is ~17nm (Vandesande and 
Persoons, 1985), well below Rii, obtained from the fit. 
Therefore, the length scale, Rti, is neither correlated with the 
hydrodynamic radius of the filaments nor with the proteins 
whose highest hydrodynamic radius is about 10 nm. § in the 
cytoplasm of E. coli equals 0.51 ±0.09nm and is comparable 
with the average distance between proteins. Parameters of the 
sdVRC (I and R^) depend on the internal structure of the cyto- 
plasm (proteins density, size of the nucleoid, etc.). Thus, each cell 
type will be characterized by a different shape of the reference 
curve (due to differences in parameters f and R\^), while the 
mathematical form of the sdVRC will not change, and such 
curve can be constructed for other cell types. 
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Table 1. Data used in the construction of sdVRC — cf Figure 2 



Probe 


M„ (kDa) 


f-p (nm) 


^ / Do \ 

\^cyto / 


Reference 


Water 


0.018 


0.16 


0.1 


Jasnin el al. (2008) 


Glucose 


0.423 


0.53 


2.1 


Mika et al. (2010) 


mEos2 


26 


2.8 


2.1 


English et al. (2011) 


EYFP 


27 


2.8 


2.4 


Kumar et al. (2010) 


GFP 


27 


2.8 


2.4 


Elowitz et al. (1999) 


GFP 


27 


2.8 


3.2 


Elowitz et al. (1999) 


GFP 


27 


2.8 


2.2 


van den Bogaart et al. (2007) 


GFP 


27 


2.8 


2.6 


Slade et al. (2009) 


GFP2 


27 


2.8 


2.3 


Nenninger et al. (2010) 


GFP 


27 


2.8 


3.2 


Mika et al. (2010) 


GFP 


27 


2.8 


2.7 


Konopka el al. (2006) 


GFP-His6 


28 


2.8 


3.1 


Elowitz et al. (1999) 


torA-GFP 


30 


2.9 


2.5 


MuUineaux el al. (2006) 


CheY-GFP 


41 


3.3 


2.8 


Cluzel el al. (2000) 


NlpA-GFP 


55 


3.7 


3.4 


Nenninger et al. (2010) 


NlpAnoLB-GFP 


55 


3.7 


3.2 


Nenninger et al. (2010) 


torA-GFP2 


57 


3.8 


2.2 


Nenninger et al. (2010) 


torA-GFP2 


57 


3.8 


2.1 


Nenninger et al. (2010) 


AmiA-GFP 


58 


3.8 


3.6 


Nenninger et al. (2010) 


AmiA-GFP 


58 


3.8 


3.6 


Nenninger et al. (2010) 


AmiAnoSP-GFP 


58 


3.8 


2.2 


Nenninger et al. (2010) 


CFP-CheW-YFP 


71 


4.1 


3.5 


Kumar et al. (2010) 


cMBP-GFP 


72 


4.1 


3.2 


Elowitz et al. (1999) 


torA-GFP3 


84 


4.4 


2.2 


Nenninger et al. (2010) 


CFP-CheR-YFP 


86 


4.4 


3.3 


Kumar et al. (2010) 


torA-GFP4 


111 


4.9 


2.2 


Nenninger et al. (2010) 


torA-GFP5 


138 


5.3 


2.8 


Nenninger et al. (2010) 


(i8-Gal-GFP)4 


582 


9.4 


3.5 


Mika et al. (2010) 


Ribosome 70S 


2,500 


16.6 


6.0 


Bakshi et al. (2012) 


mRNA-GFP 


6,000 


21.3 


6.2 


Golding and Cox (2004) 


Plasmid-GFP 


18,480 


203.9 


10.1 


Campbell and Mullins (2007) 



3.3 Other models of diffusion in the cytoplasm 

We compared our results with three models of diffusion in the 
cytoplasm of E. coli, available in the literature (Figures 3 and 4). 
McGuffee and Elcock (2010) proposed two models of diffusion 
in the cytoplasm: the 'steric' model, which takes into account 
only steric interactions between diffusing proteins, and the 
'full' model, which includes steric, electrostatic and hydro- 
dynamic interactions between diffusing entities. Comparison of 
the results (Figure 3) shows that the model we propose takes into 
account possible interactions between the diffusing probes and 
the surrounding environment. Moreover, we show that the full 
information needed to build the sdVRC can be obtained only 
after taking into account the probes whose /-p greatly exceeds i^h- 
For example, simulations conducted by McGuffee and Elcock 
(2010) include proteins that are most abundant in the cytoplasm, 
but the absence of large objects such as the nucleoid leads to 
underestimated values of Do/Dcyio- The effect starts to be mean- 
ingful for probes whose /•p>10nm. In that case, the values of 
Dd/D^yio are lower by an order of magnitude with respect to 
experimental results. 

We also compared our results with the model proposed by 
Mika and Poolman (2011), where Dcyo oc M^^" ''. As can be 



seen, the power law dependence of Dcyto on M„ may also lead 
to underestimated values of Z)o/i)cyto- For example, for the ribo- 
some 70S D(,/Dcyio measured experimentally is five times higher 
than predicted using power law dependence. Therefore, the 
power law dependence proposed by Mika and Poolman (2011) 
holds for the proteins in a small range of molecular weights 
20-30 kDa and, moreover, is not applicable to macromolecules 
other than proteins. This is because each type of macromolecules 
(DNA, RNA, proteins, polymers, etc.), has different shape and 
thus different dependence of ;-p on [Equations (l)-(5)]. The 
shape of the macromolecule and in consequence its radius trans- 
lates into the DC. The dependence of DC fcyto of different types 
of macromolecules (proteins, RNA and DNA) on their molecu- 
lar weight is shown in Figure 4. 

3.4 Accuracy of the model 

Accuracy in determination of the sdVRC strongly depends on 
the amount of available data. One would expect that increasing 
the amount of data for probes of Rh and /-p < f , would 
significantly decrease the maximum error of the sdVRC (com- 
pare Fig. 2). 
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Fig. 3. The comparison of sdVRC with other existing models. The plot 
shows the literature values for the logarithm of fo/i'cyto (open squares) 
(Bakshi et al., 2012; Campbell and Mullins, 2007; Cluzel et al.. 2000; 
Elowitz et al, 1999; English et al., 2011; Golding and Cox, 2004; 
Jasnin et al, 2008; Konopka et al, 2006; Kumar et al, 2010; Mika 
et al, 2010; MuUineaux et al, 2006; Nenninger et al, 2010; Slade et al. 
2009; van den Bogaart et al. 2007). Black solid line represents 
Equation (7) with parameters: f = 0.51 ± 0.09 nm, = 42 ± 9nm and 
a = 0.53 ± 0.04. We compared our results with data generated by 
McGuffee and Elcock (2010) and Mika and Poolman (2011). The data 
generated by McGuffee and Elcock (2010) were fitted by Equation (7), 
yielding the following parameters: for the 'full' model f = 0.2 it 0.2 nm, 
iJh = 20 ± 48 nm and a = 0.32 ± 0. 12 (dotted circle, dotted line), for the 
'steric' model ^ = 3.57 ± 0.1 nm, .Rh=17±6nm and a = 0.59±0.05 
(open diamond, dashed line). The model proposed by Mika and 
Poohnan (2011) where Dcyto oc M"" ' is plotted as dashed-dotted line 



To test the accuracy of the presented method, we perform an 
analysis of the error of calculation of DC (5i)cyto for GFP as a 
function of the number of experimental data points. Using 
Equation (7), we generated 10 datasets, where the number of 
data points ranges from 6 to 100. The generated data were uni- 
formly distributed on a logarithmic scale and were randomly 
drawn on the assumption that measurement error is described 
by a normal distribution with standard deviation ct = 0.1. We 
assumed that the error of ;-p equals to 5%. We found that 20 data 
points are sufficient to obtain SDcyio at the level of 20% for the 
GFP (averaged over 10 generated datasets). In comparison, 
SDcyto obtained from the analysis of the literature data was at 
the level of 40% (cf. Fig. 2). This is mainly because of the small 
number of available experimental data. Furthermore, most of the 
experimental data are available for a narrow range of hydro- 
dynamic radii (around 3nm, cf. Fig. 2) which is not preferred 
in this type of analysis. To date, however, there is no experimen- 
tal data which would improve the accuracy of the sdVRC. 
Therefore, to improve the accuracy, additional experiments are 
needed to cover a wider range of ;-p of the probes and also 
uncertainties of i^Oi-Dcyto and ;-p should be minimized. 

3.5 DCs of proteins 

Preparing a database of DCs of the entire proteome, one should 
keep in mind that about 45% of the proteome are proteins 



Fig. 4. Comparison of measured and predicted Dcyto as a function of 
molecular weight of the investigated probes. Predicted dependencies 
shown in the graph are expressed by Equation (7). The hydrodynamic 
radius /-p of each type of macromolecules is given by the relationship: 
i-p = CM^ nm, where M„ is the molecular weight of the macromolecule. 
For proteins C = 0.0514 and a- = 0.392— Equation (1); RNA C=0.0566 
and 0- = 0.38— Equation (2), linear DNA C = 0.024 and q' = 0.57— 
Equation (3); circular DNA C= 0.0125 and 0- = 0.59— Equation (4); 
super coiled C = 0.0145 and a- = 0.57 — Equation (5). For comparison, 
we present experimental data on DCs of proteins (Cluzel et al, 2000; 
Elowitz et al, 1999; English el al., 2011; Konopka et al, 2006; Kumar 
et al, 2010; Mika et al, 2010; MuUineaux et al, 2006; Nenninger et al, 
2010; Slade et al, 2009), RNA (Golding and Cox, 2004), plasmid 
(Campbell and Mullins, 2007) and ribosomes 30S and 70S (Bakshi 
et al, 2012). The dashed-dotted straight line indicates the relationship 
D <x M^" ' proposed by Mika and Poolman (2011). The dependence of 
Dcjto on M„ proposed by Mika and Poolman (2011), when applied to 
large plasmids (M„ ~ 2 x lO^'kDa), yields several orders of magnitude 
overestimation of DC 



forming a larger macromolecular complex (homo-, hetero- 
oligomers and complexes of membrane proteins with transloca- 
tion proteins). Thus, the calculation of DCs of proteins should be 
carried out also for protein complexes. The Uniprot protein 
database (Apweiler et al, 2011; Jain et al., 2009) contains infor- 
mation on the molecular weight of proteins, their quaternary 
structure and their location in cell. Using these data and 
sdVRC (cf. Fig. 2) we calculated the DCs Dcyto of all proteins 
in E. coli (Supplementary Table SI) present in the cytoplasm as 
monomers (single polypeptide chains) or as multimers (homo- or 
hetero-oligomers) or complexes composed of many chains, see 
Fig. 5). Figure 5A shows the histogram of molecular weights of 
cytoplasmic proteins, including homo- and hetero-multimers. 
Distribution of molecular weights of proteins is given by 
log-normal distribution with probability density function 
^(Mw) =(,/(2njaM„y^ exp[-(ln(M„//x))V(2CT^)], where stand- 
ard deviation a = 0.825 ± 0.007 and mean molecular weight 
= 31.9 ± 0.3 kDa. The relationship between the DC and the 
molecular weight of protein is expressed by the Equations (1) and 
(7). A histogram of DCs of cytoplasmic proteins is shown in 
Figure 5B (same proteins as in Fig. 5A). The distribution follows 
the curve given by the probability density function: 

p{D,yt,{M,,)) = <7(M„)|dM(i),y,o)/dZ)eyto|. 
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Fig. 5. Distributions of molecular weights and DCs of cytoplasmic - 
proteins in E.coli. (A) Histogram of molecular weights of cytoplasmic 
proteins (created using data from the Uniprot database). The histo- 
gram is described by log-normal distribution g(M„) with standard 
deviation a = 0.825 ± 0.007 and the mean molecular weight 
/L< = 31.9 ± 0.3 kDa. (B) Histogram of DCs of cytoplasmic proteins 
(from our database) and the probability density function 
/;(£)cyto(Mw)) = g(Mw)|dM(£lcyto)/di)cyto|— soHd line 



We also calculated Z)cyto of membrane proteins that are 
~30% of the proteome of E. coli. Membrane proteins, after 
synthesis by the ribosome, are transported to the membrane, 
according to one of the two pathways: the TAT (Sargent, 
2007) in which proteins are transported as single polypeptides 
in a folded state and the Sec (Driessen and Nouwen, 2008) in 
which unfolded proteins are complexed mainly by one of the two 
proteins: SecB or Tig. 

We created a database (Supplementary Table SI) listing the 
DCs of all proteins, including their monomelic forms, the pos- 
sible homo- and hetero-multimers, and in the case of membrane 
proteins also the complexes with translocation proteins (SecB 
and Tig). Apart from DCs of proteins, we calculated i)cyto of 
small molecules such as amino acids or sugars and for macro- 
molecules such as RNA or DNA (Linear, circular and super 
coiled). Calculated values of DCs are listed in Table 2. 

The predicted DCs refer only to three-dimensional diffusion. 
In cells, particularly eukaryotes, there are also other types of 



Table 2. Predicted, cytoplasmic DCs of small amino acids, sugars, se- 
lected proteins and ribosomes and DNA constructs 



Molecule 


I'p (nm) 


-Ocyto (M™^/s) 


Guanine 


0.29 


539 


T-TictiHitiP 


0.32 


478 




0.33 


458 


Artrininp 


0.34 


428 


Lactose 


0.41 


328 


ATP 


0.43 


302 


TrpR-Monomer 


2.1 


19.71 


TrpR-Dimer 


2.7 


10.92 


Lacl-Monomer 


3.2 


7.28 


Lacl-Tetramer 


5.6 


1.79 


RNAP Holoenzyme 


8.5 


0.5 


Ribosome 30s 


11.6 


0.18 


Ribosome 50s 


13.2 


0.11 


Ribosome 70s 


16.6 


0.05 


Pyes2 


142'' 


1.13x10-" 


CTD-2657L24 


802*' 


1.62x10-5 



''Hydrodynamic radius calculated using Equation (3). 
''Hydrodynamic radius calculated using Equation (5). 



transport such as molecular motors (Vale, 2003). Nevertheless, 
mobility, irrespective of the type of motion, is inversely propor- 
tional to the viscosity of the surrounding environment. Since the 
viscosity is dependent on the scale (Holyst et al., 2009; 
Kalwarczyk et al., 2011; Szymanski et al., 2006a, b), each type 
of motion will depend exponentially [Equation (7)] on the size of 
a moving object. 

3.6 Application of DC database in studies of biochemical 
processes occurring in cells 

Using the database of DCs, one can determine quantitatively 
whether the protein diffuses freely or interacts and forms com- 
plexes with much larger macromolecules, e.g. plasmids. 
Capoulade et al. (2011) performed diffusion measurements and 
showed that, in the nucleus of eukaryotic cell, euchromatin cre- 
ates domains of high and low affinity for heterochromatin pro- 
tein (HPlcf). 

Another kind of analysis was performed by Elf et al. (2007). 
Authors compared in vivo DCs of both: the lactose repressor in 
its native form and the lactose repressor devoid of the 
DNA-binding domain. Order of magnitude difference in the co- 
efficient of diffusion of both proteins led to the conclusion that 
the native lactose repressor spends 87% of the time attached to 
the DNA. This shows that the presence of attractive interactions 
between diffusing particles will result in a slowdown of diffusion 
of molecules. 

To clarify the method, consider a hypothetical protein of 
hydrodynamic radii Cp = 3 nm. The DCs of this protein i)cyto 
(calculated from sdVRC) is approximately equal to 
Z),,yto = 8.7/im^/s. The time required by the protein to visit 
every place in the cell volume [for E. coli V ~ 0.6 /im^ 
(Kubitschek, 1990)] is approximately equal to t = 
V/47tDcytorp ^ 1.8 s. Now suppose that the protein binds to a 
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plasmid whose molecular weight equals to 10 000 kDa, the DC of 
the plasmid is of the order of /)piasm = 10^''//m^/s. Suppose 
further that the protein spends one-tenth of the time diffusing 
freely zt, and the remaining 90% of time Tc as a complex with the 
plasmid (tc — Wt[). The effective DCs of the complexes D^ii\ 
defined as -Dcff = (^cyto + ^cTc/Tf)/(l + Tc/Tf), and under as- 
sumption that Dc = -Dpiasm, will be nearly an order of magnitude 
lower than the predicted one (Z'cyto):-DciT = 0.8/Ltm^/s. According 
to the above analysis, we can assume that any deviation of ex- 
perimentally measured DC from the proposed sdVRC will result 
from intermolecular interactions such as specific or non-specific 
binding. 

3.7 Diffusion in the cytoplasm and the diffusion in 
organelles of eukaryotes 

Prokaryotic cells are characterized by small sizes [volume of E. 
coli is approximately V ~ 0.6/um' (Kubitschek, 1990)]. 
Measurements of diffusion in the cytoplasm of these cells are 
performed for the entire volume of the cytoplasm. Thereby, the 
effective DC measured in these experiments is the value averaged 
over the entire volume of the cytoplasm. Because the sdVRC was 
found on the basis of DCs, in the case of £. coli, this curve is also 
averaged over the entire volume of the cell. At this point, it 
should be stressed that the sdVRC should not be used to describe 
diffusion on the cell membrane due to structural differences be- 
tween membrane and cytoplasm, and the two-dimensional 
nature of such diffusion. 

Small sizes of prokaryotic cell also affect the long-time behav- 
iour of diffusing objects. This is known as confined diffusion 
(Ochab-Marcinek and Holyst, 2011). Nevertheless, from the 
normal, three-dimensional DCs (short time diffusion), one can 
draw constructive conclusions. For example, English et al. (201 1) 
on the basis of short-time diffusion measurements have charac- 
terized the catalytic cycle of RelA protein. 

Eukaryotic cells are much larger than bacteria. Therefore, 
measurements of diffusion in these cells are easier and can be 
performed in the individual organelles [e.g. nucleus (Pederson, 
2000)]. In previous work, we showed that it is possible to con- 
struct a reference curve for the cytoplasm of mammahan HeLa 
and Swiss 3T3 cells (Kalwarczyk, et al., 20 II). However, based 
on comparison of the results obtained by Lukacs et al. (2000) for 
the cytoplasm and the nucleus of HeLa cancer cell, we expect 
that the sdVRC determined for each cellular organelle is differ- 
ent. Furthermore, as sdVRC depends on the structure of the 
environment where diffusion occurs, it should be unique for a 
given cell or even organelle. 

4 CONCLUSION 

The method presented above has a high predictive power. 
Although, so far a large error of the method (40% for proteins), 
the experimentally measured DCs coincide remarkably well with 
the predicted DCs (cf Fig. 4). Therefore, measurements of DCs 
of several inert probes in a single cell type allow to determine the 
DCs of thousands of proteins and other (macro)molecules. 
Correctly designed experiment would require involvement of dif- 
ferent experimental techniques (NMR, FRAP, FCS, particle 
tracking) and the use of probes in a wide range of sizes. 



One needs to know the DC of a given probe in water and/or 
the hydrodynamic radius of this probe. Additionally for the same 
probe, measurements of diffusion in cytoplasm of the cell should 
be performed. Sizes of selected probes should be uniformly dis- 
tributed along the logarithmic scale of sizes. We showed that 
only 20 measurements are required to predict the cytoplasmic 
DC of the typical protein with 20% accuracy. 

Analysis of the sdVRC allows to determine the characteristic 
length scales R\i and f , and the DC of any (macro)molecule in the 
cytoplasm. For the cytoplasm of E. coli, we found that is 
surprisingly well correlated with the average radius of the 
DNA loops forming the nucleoid. This suggests that the nucloeid 
is the main crowding agent (responsible for the macroscopic vis- 
cosity) in the cytoplasm of E. coli. 

Finally, it should be noted that there are no additional require- 
ments (except experimental data) to construct analogous data- 
base of DCs in other systems such as the nucleus or 
mitochondria of eukaryotic cells. We also believe that sdVRC 
can be easily adopted to calculate other types of mobility, includ- 
ing one-dimensional sliding, velocity of molecular motors, etc., 
as they all are inversely proportional to the viscosity. 
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